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Abstract — The  framework  of  kernel  regression  [1],  a  non- 
parametric  estimation  method,  has  heen  widely  used  in  different 
guises  for  solving  a  variety  of  image  processing  problems 
including  denoising  and  interpolation  [2].  In  this  paper,  we 
extend  the  use  of  kernel  regression  for  dehlurring  applica¬ 
tions.  Furthermore,  we  show  that  many  of  the  popular  image 
reconstruction  techniques  are  special  cases  of  the  proposed 
framework.  Simulation  results  confirm  the  effectiveness  of  our 
proposed  methods. 


This  paper  is  organized  as  follows.  In  Section  11,  we 
briefly  review  the  kernel  regression  framework  and  formulate 
a  deblurring  estimator.  Section  III  illustrates  two  experiments 
on  simulated  data  sets,  and  we  conclude  this  paper  in  the  last 
section. 

II.  Kernel  Deblurring 

A.  Review 


I.  Introduction 

In  our  earlier  work  [2],  [3],  we  studied  the  kernel  regression 
(KR)  framework,  and  proposed  the  data-adaptive  version  of 
kernel  regression  for  use  in  image  and  video  processing. 
The  applicability  of  data-adapted  kernel  regression  is  wide- 
ranging,  for  example,  image  denoising  (including  white  Gaus¬ 
sian,  Laplacian,  Salt  &  Pepper,  Compression  artifacts,  and 
Color  artifacts),  and  image  interpolation/reconstruction  from 
regular  and  irregularly  sampled  data  sets  (e.g.  image  fusion 
and  super-resolution).  However,  since  these  direct  applica¬ 
tions  of  KR  neglected  the  atmosphere  or  camera’s  point 
spread  function  (blur)  effects,  the  estimated  signals  (images) 
need  further  processing.  Such  a  two-step  filtering  process  (e.g. 
denoising  +  deblurring)  is  in  general  suboptimal  [4].  In  this 
paper,  we  develop  a  one-step  procedure  for  denoising  and 
deblurring,  based  on  the  kernel  regression  framework. 

To  date,  the  kernel  regression  framework  has  never  been 
directly  used  for  deblurring.  As  is  well-known,  deblurring 
is  an  ill-posed  problem,  and  it  requires  an  appropriate  reg¬ 
ularization  which  inttoduces  prior  information  about  the 
desired  signals.  For  piecewise  constant  signals,  the  total 
variation  (TV)  regularization  (Li-norm),  proposed  in  [5],  is 
a  better  choice  than  the  Tikhonov  regularization  (L2-norm) 
for  denoising  [6],  [7],  [8]  as  well  as  deblurring  [9],  [10] 
applications.  The  implicit  assumption  of  piecewise  constancy 
in  TV  regularization  prevents  estimated  signals  from  having 
fine  texture  and  gradation.  Hence,  the  question  which  we 
must  consider  is  how  to  more  effectively  estimate  signals  with 
texture  and  gradation.  In  this  paper,  we  incorporate  the  theory 
of  kernel  regression  and  propose  a  deblurring  algorithm  with 
a  suitable  regularization. 
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and  by  the  National  Science  Foundation  Science  and  Technology  Center  for 
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The  kernel  regression  framework  defines  its  data  model  in 
2-D  as 


Tji  =  z{yii)  +  Ei,  t  =  yii  =  [xii,X2i]^ ,  (1) 


where  yi  is  a  noisy  sample  at  x^,  z(-)  is  the  (hitherto 
unspecified)  regression  function  to  be  estimated,  Si  is  an  i.i.d 
zero  mean  noise,  and  P  is  the  total  number  of  samples  in 
a  neighborhood  (window)  of  interest.  As  such,  the  kernel  re¬ 
gression  framework  provides  a  rich  mechanism  for  computing 
point-wise  estimates  of  the  regression  function  with  minimal 
assumptions  about  global  signal  or  noise  models. 

While  the  specific  form  of  z{-)  may  remain  unspecified,  we 
can  rely  on  a  generic  local  expansion  of  the  function  about  a 
sampling  point  x^.  Specifically,  if  x  is  near  the  sample  at  x^, 
we  have  the  {N  +  l)-term  Taylor  series^ 

z{yii)  «  z(x)  -f  {Vz(x)}^(xj  -  x) 

-h^(xi  -  x)'^{7iz(x)}(xi  -  x)  H -  (2) 

=  /3o+/3?’(xi-x)-l-/3^vech{(xj-x)(xi-x)^H - (3) 


where  V  and  T-L  are  the  gradient  (2  x  1)  and  Hessian  (2  x  2) 
operators,  respectively,  and  vech(-)  is  the  half-vectorization 
operator  which  lexicographically  orders  the  lower  triangular 
portion  of  a  symmetric  matrix.  Furthermore,  (3o  is  z(x),  which 
is  the  pixel  value  of  interest,  and  the  vectors  (3^  and  ^2 


/3i 

/32 


'  dz{-x) 
dxi 

'a"z(x) 

2dx\ 


n  T 


i9x(x) 

dx2 
9"z(x) 
dxidx2 


9^z(x) 
28x2  j 


(4) 

(5) 


Since  this  approach  is  based  on  local  approximations,  a 
logical  step  to  take  is  to  estimate  the  parameters  {/3„}^_g 


'other  localized  representations  are  also  possible  and  may  be  advanta¬ 
geous. 
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from  all  the  samples  while  giving  the  nearby  samples 

higher  weights  than  samples  farther  away.  A  formulation  of 
the  fitting  problem  capturing  this  idea  is  to  solve  the  following 
optimization  problem, 

p 

\yi  - 1^0-  (x*  -  x) 


-/3^vech  {(xi  -  x)(xi  -  x)^ 


m 

•••}  A:Hi(xi  -  x)(6) 


with 

i^H,(x.-x)=^ir(H-(x.-x)),  (7) 

where  N  is  the  regression  order,  m  is  the  error  norm  param¬ 
eter  typically  set  to  2,  K{-)  is  the  kernel  function  (a  radially 
symmetric  function),  and  is  the  smoothing  (2  x  2)  matrix 
which  dictates  the  “footprint”  of  the  kernel  function.  Although 
the  choice  of  the  particular  form  of  the  kernel  is  open,  it  has 
a  relatively  small  effect  on  the  accuracy  of  estimation  [11]. 
More  details  about  the  optimization  problem  above  can  be 
found  in  [2],  [3]. 

Following  our  previous  work  [2],  we  also  briefly  review 
three  types  of  the  kernel  functions. 

1)  Classic  kernel: 


'f^Hi(xi  —  x),  with  Hi  =  hi,  (8) 


where  and  Za;2(-)  are  the  first  derivatives  along  xi  and 

X2  directions  and  Wi  is  a  local  analysis  window  around  the 
position  of  interest.  The  intuitive  idea  behind  the  superior 
performance  of  the  steering  kernel  is  to  rely  on  the  image 
structure  information  (given  by  the  local  dominant  orienta¬ 
tion  [15]),  instead  of  using  radiometric  distances  directly  to 
make  the  data-adaptive  kernels.  Since  the  regularized  local 
orientation  estimate  as  formulated  in  [2]  has  strong  tolerance 
to  noise,  the  steering  kernel  is  more  stable  than  the  bilateral 
kernel. 

The  optimization  problem  (6)  eventually  provides  a  point- 
wise  estimator  of  the  regression  function.  However,  as  stated 
in  the  previous  section,  the  data  model  (1)  ignores  other 
distortion  effects.  In  the  following  section,  we  use  a  blurred 
and  noise-ridden  data  model  and  derive  a  KR  based  deblur- 
ring/denoising  estimator. 

B.  Deblurring  Estimator  and  Related  Methods 

Defining  the  shift-invariant  point  spread  function  (PSF)  as 
5(x)  and  the  desired  true  function  as  m(x),  we  consider  the 
blurred  and  noise-ridden  data  model 

y  =  z(x) -f  e  =  &(x)*  u(x) -f  e,  (12) 

where  *  is  the  convolution  operator.  For  convenience,  we 
write  the  point-wise  model  in  vector  form  as: 


where  h  is  the  global  (spatial)  smoothing  parameter.  This  is 
the  standard  choice  of  the  kernel  function  and  it  is  a  function 
of  only  spatial  information  (spatial  distances).  However  the 
classic  kernel  suffers  from  a  limitation  due  to  the  local  linear 
action  on  the  given  data.  Thus,  in  [2],  we  proposed  two  data- 
adapted  kernel  functions;  bilateral  kernel  and  steering  kernel, 
which  depend  not  only  on  spatial  information  but  also  on 
radiometric  information. 

2)  Bilateral  kernel: 

-  x)Kh,.iyi  -  y),  with  =  hi,  (9) 

where  hr  is  the  global  radiometric  smoothing  parameter.  In 
[2],  we  showed  that  the  popular  bilateral  Alter  [12],  [13] 
is  a  special  case  of  the  general  KR  denoising  approach 
using  the  above  bilateral  kernel.  The  bilateral  kernel  takes 
radiometric  distances  explicitly  into  account,  which  limits  its 
performance,  particularly  when  the  measurements  are  very 
noisy. 

3)  Steering  kernel: 

ATHf-'  (xi  -  x) ,  with  Hf  =  hC  I ,  (10) 


Y  =  Z-f  £  =  BU-f  £,  (13) 

where  Y  is  the  blurry  and  noisy  measured  image,  Z  is  the 
blurry  image,  B  is  the  blur  operator,  and  U  is  the  image 
of  interest.  The  underline  notation  denotes  matrices  that  are 
lexicographically  ordered  into  column-stack  vectors: 


Vi 

■  2:(xi)  ■ 

m(xi) 

£i 

Y  = 

,Z  = 

,u  = 

,£  = 

.  Vp  . 

.  ^(Xp)  . 

.  u(Xp)  _ 

.  . 

-P 

(14) 


Using  the  above  notations,  we  also  rewrite  the  point-wise 
Taylor  expansion  (2)  in  vector  form  as  (see  Appendix  for  the 
derivation) 

+  'IlxIVi  +  Z^j2.2UiU2  +  -I-  •  •  •  ^ 

=  s-;^s-;=b(u  +  u,^ui  +  u,,u2 

-I-  11x1X2'^^'^^"^  +  Ua;2U2  H - ^ 

=  (15) 


is  a  more  robust  choice  for  the  data-adaptive  kernel  functions 
[14],  [2].  We  call  the  steering  matrix,  and  a  naive 

estimate  of  the  covariance  matrix  may  be  obtained  as 
follows: 


'^XjGWi  ^2;i(Xj)Za;2(Xj) 


E 

E 


Xj  Gwi 
Xj  Gwi 


ZxA^j)Zx2{^j) 

Zx2{^j)Zx2{^j)  ’ 


(11) 


with 


B  BI„, 

BI„2 

BI„2 

1 

-  - Jj  1 

Ul. 

ur2 

— -i-l 

—X\X2 

U^2  •  •  • 

■^2 

T 

,  (16) 

where  is  the  ui-pixel  shift  operator  along  the  xi  direction, 
7ixi  and  Z^,^  are  the  first  and  the  second  derivative  in  the 


Xi  direction  respectively,  and  1^^  =  diag{t;i,  •  •  •  ,  fi}.  In  the 
absence  of  other  modeling  errors  and  noise,  the  approximation 
suggests  the  following  constraint: 


z- s-;'is-;"BU  =  g.  (i?) 


Since  Taylor  approximation  with  a  hnite  number  of  terms  is 
only  valid  when  vi  and  V2  are  small,  it  suggests  a  likelihood 
term  with  larger  weights  for  smaller  vi,  v^'- 


Cl(U)  =  EEfY-S-S-BU 

Vi  V2 


m 

W(v)’ 


(18) 


where  v  =  ,  m  is  the  error  norm  parameter  and 


W(v)  =  diag{iTHi(v),  •  •  •  ,iTHp(v)}.  (19) 


However,  with  this  cost  function  alone,  the  optimization 
problem  might  be  still  ill-posed;  in  particular,  when  the  width 
of  the  PSF  is  large.  Therefore,  we  introduce  a  regularization 
term  which  further  restricts  the  solution  space  of  the  signal 
of  interest.  To  this  end,  since  the  Taylor  expansion  locally 
represents  the  desired  signals,  we  have  another  approximation 
for  the  true  image  U  directly: 

+U^pl  +  +  Hxl'v'i  +  ■  ■  •  ) 

=  (20) 


with 


I  = 


I  I,. 


(21) 


The  above  approximation  gives  the  regularization  term  with 
weights. 


Vi  V2 


TJlr 

W^(v)’ 


(22) 


where  rrir  is  the  error  norm  parameter  and  Wr(v)  is  the 
weight  matrix  for  this  term. 

In  summary,  the  overall  optimization  problem  is  formulated 
as 


minC(U)  = 

u 


mm{CL(U)  +  ACfl(U)} 


min 

u 


EE 


m 

W(v) 


u-  sr’'isr’'=iu 

TYlr  '] 

X\  X2 

W,(v)  J 

(23) 


where  A  is  the  regularization  parameter.  We  solve  the  opti¬ 
mization  problem  using  the  steepest  descent  method: 


fjg+i)  =  -f  V 


9C(U) 

5U 


D=U(^) 


(24) 


where  v  is  the  step  size.  When  using  the  data-adapted  kernel 
functions,  we  need  to  calculate  the  weight  matrices  W  and 
Wr  beforehand.  In  [2],  we  proposed  an  iterative  way  to  rehne 


(a)  Original 


(b)  The  blurred  image 


Fig.  1.  The  original  image  and  the  image  bluiTed  with  a  5  x  5  Gaussian 
PSF  with  f7  =  1.5.  The  corresponding  RMSE  is  11.48. 


the  weights.  However,  in  the  case  of  using  the  steepest  descent 
method  here  (or  other  iterative  methods  to  minimize  the  cost 
function  (^(U)),  after  the  initialization,  we  will  update  U  and 
the  weight  matrices  alternately  in  each  iteration^. 

There  are  two  points  that  we  must  not  ignore  about  the 
regularization  cost  function  (22).  First,  the  regularization 
term  is  general  enough  to  subsume  several  other  popular 
regularization  terms  existing  in  the  literature.  In  particular, 
when  we  choose  the  regression  order  N  =  0,  (22)  becomes 

EE||SI-St.”'S.TSl||”'  .  (25) 

Vi  V2 

For  TTir  =  2  and  nir  =  1,  (25)  can  be  regarded  as  Tikhonov 
and  digital  TV,  respectively,  with  Wr(v)  =  1,  juil  <  1,  and 
IU2I  <  1.  Furthermore,  for  =  1,  the  formulation  (25) 
is  well  known  as  Bilateral  Total  Variation  (BTV)  [16]  with 
Wr(v)  =  where  0  <  a  <  1.  That  is  to  say,  the 

kernel  function  for  BTV  is 

-  x)  =  (26) 

where  a  is  the  global  smoothing  parameter  in  this  case.  Of 
course,  other  choices  of  the  kernel  function  are  also  possible. 
In  the  literature,  the  data-adaptive  versions  of  the  kernel 
function  have  recently  become  popular  for  image  restoration. 
The  kernel  (or  weight)  functions  of  Bilateral  hlter  [12],  [13], 
Mean-Shift  [17],  [18],  and  Non-Local  Mean  [19]  are  typical 
examples,  and  those  kernel  functions  are  usable  for  (23)  as 
well. 

Second,  all  the  related  regularization  terms  and  methods 
discussed  above  are  zeroth  order  Taylor  approximations  (i.e. 
they  neglect  the  higher  order  derivatives).  Hence,  the  esti¬ 
mated  images  often  tend  to  appear  piecewise  constant.  On  the 
other  hand,  the  regularization  term  in  (22)  can  naturally  take 
higher  order  derivatives  into  account,  resulting  in  estimated 
images  with  more  detailed  texture. 

HI.  Experiments 

Using  the  eye  section  of  the  Lena  image,  which  is  shown 
in  Fig.  1(a),  we  create  a  blurred  image,  shown  in  Fig.  1(b), 

^Alternatively,  one  can  consider  updating  the  weight  matrices  every  few 
iterations. 


(a)  The  degraded  image 


(b)  Total  Variation 


(c)  ForWaRD  [20] 


(d)  Classic  kernel  deblurring 


(e)  Bilateral  kernel  deblurring  (f)  Steering  kernel  deblurring 

Fig.  2.  The  deblumng  simulation  with  small  amount  of  noise:  (a)  the 
blurred  noisy  image  (BSNR=40[dB]),  (b)  the  image  deblurred  with  total 
variation  (A  =  0.0001),  (c)  ForWaRD  [20],  (d)  the  image  deblun'ed  with 
classic  kernel  (TV  =  2,  m  =  2,  mr  =  2^h  =  0.25,  A  =  0.5),  (e)  the  image 
deblurred  with  bilateral  kernel  (TV  =  2,  m  =  2,  mr  =  2,h  =  0.25,  hr  = 
40.0,  A  =  0.5),  and  (f)  the  image  deblurred  with  steering  kernel  (TV  = 
2,m  =  2,mr  =  2,h  =  0.15,  A  =  0.5,  one  iteration).  The  con'esponding 
RMSE’s  are  (a)  11.49,  (b)  6.38,  (c)  6.95  (d)  6.01,  (e)  5.96,  and  (f)  5.99. 


by  convolving  with  a  5  x  5  Gaussian  PSF  with  a  standard 
deviation  of  1.5.  The  resulting  RMSE  for  the  blurred  image 
is  1 1.48.  With  the  blurry  Lena  image,  we  did  simulations  with 
two  different  noise  levels. 

First,  we  added  white  Gaussian  noise  with  BSNR  = 
40[dB]^,  which  gives  us  the  image  shown  in  Fig. 2(a).  The 
estimated  images  by  the  TV  method  (A  =  0.0001),  For- 
WaRD"^  [20],  the  classic  kernel  deblurring  (N  =  2,  m  — 
2,  rrir  =  2,h  —  0.25,  A  =  0.5),  the  bilateral  kernel  deblurring 
(N  =  2,  m  =  2,  nir  =  2,h  =  0.25,  hr  =  40.0,  A  =  0.5),  and 
the  steering  kernel  deblurring  (N  =  2,m  =  2,  nrir  =  2,h  = 

^Bluired  Signal  to  Noise  Ratio  =  10  log  ^ 

^The  software  is  available  at  http://www-dsp.rice.edu/software/ward.shtml. 
We  used  the  default  parameters  which  are  suggested  by  the  authors. 


(a)  The  degraded  image 


(b)  ForWaRD  [20] 


(c)  BTV  [16]  (d)  Steering  kernel  deblurring 

Fig.  3.  The  deblurring  simulation  with  large  amount  of  noise:  (a)  the  blurred 
noisy  image  (BSNR=20[dB]),  (b)  ForWaRD  [20],  (c)  Bilateral  Total  Variation 
[16],  and  (d)  steering  kernel  deblurring  (N  =  2,m  =  2,mr  =  l,h  = 
1.4,  A  =  0.5).  The  corresponding  RMSE’s  are  (a)  12.36,  (b)  9.55,  (c)  8.91, 
and  (d)  8.59. 


0.15,  A  =  0.5)  are  illustrated  in  Figs.2(b)-(f),  respectively. 
The  corresponding  RMSF’s  are  (a)  11.49,  (b)  6.38,  (c)  6.95 
(d)  6.01,  (e)  5.96,  and  (f)  5.99.  In  this  case,  the  estimated 
images  are  (at  least  visually)  very  similar. 

Second,  we  set  the  amount  of  noise  to  BSNR  =  20  [dB]  to 
create  the  blurry  and  noisy  image  shown  in  Fig. 3(a).  The  es¬ 
timated  image  by  ForWaRD  [20],  BTV  [16],  and  the  steering 
kernel  deblurring  (N  =  2,m  =  2,  rur  =  1,  h  =  1.4,  A  =  0.5) 
are  illustrated  in  Fig.3(b),  Fig. 3(c),  and  Fig. 3(d),  respectively. 
The  corresponding  RMSF’s  are  (a)  12.36,  (b)  9.55,  (c)  8.91, 
and  (d)  8.59.  The  steering  kernel  is  the  best  in  this  simulation. 


IV.  Conclusion  and  Future  Work 

This  paper  presented  a  simultaneous  denoising  and  de¬ 
blurring  method  based  on  the  kernel  regression  framework, 
explained  how  it  is  related  to  other  methods,  and  demonstrated 
its  performance  using  simulated  data  sets.  The  simulations  in¬ 
dicated  the  superiority  of  our  methods  in  the  most  challenging 
scenarios;  namely,  for  very  noisy  deblurring  problems. 

We  applied  the  method  for  2-dimensional  data.  However, 
it  is  also  possible  to  generalize  this  method  to  deal  with 
higher-dimensional  data.  Moreover,  aside  from  single  frame 
deblurring  and  denoising,  the  proposed  method  is  also  ap¬ 
plicable  to  irregularly  sampled  data  sets  and  to  multi-frame 
deconvolution  problems  as  well. 


Appendix  I 

Taylor  Approximation  in  Vector  Form 

The  Taylor  expansion  in  the  1-D  case  is 

z{xi)  «  z{x)  +  z\x){xi  —  x)  +  -z"{x){xi  —  cc)  +  •  •  •  .  (27) 

When  we  have  P  positions  of  interest,  Z  = 
[z{xi),---  ,z{xp)]'^  is  a  vector  composed  of  the  data 
set  of  interest.  By  defining  v  =  Xi  —  x,  the  nearby  values 
for  each  position  are  approximately  expressed  by  using  the 
Taylor  expansion  as 


z{xi+v) 

z{xi) 

z'{xi) 

V'(xi)- 

Rd 

+ 

V  + 

_z{Xp+v)_ 

_z{Xp)_ 

z'{Xp)_ 

y'fe). 

(28) 

which  in  vector  form  is 

Z  +  Z'z;  +  Z"y +  •••  ,  (29) 

where  S’'  is  the  warping  operator.  Finally,  we  have  the  Taylor 
expansion  in  matrix  form 
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